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Ensemble Observability of Linear Systems 
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Abstract —We address the observability problem for ensembles 
that are described by probability distributions. The problem is 
to reconstruct a probability distribution of the initial state from 
the time-evolution of the probability distribution of the output 
under a classical finite-dimensional linear system. We present 
two solutions to this problem, one based on formulating the 
problem as an inverse problem and the other one based on 
reconstructing all the moments of the distribution. The first 
approach leads us to a connection between the reconstruction 
problem and mathematical tomography problems. In the second 
approach we use the framework of tensor systems to describe 
the dynamics of the moments which leads to a more systems 
theoretic treatment of the reconstruction problem. Furthermore 
we show that both frameworks are inherently related. The appeal 
of having two dual view points, the first being more geometric 
and the second one being more systems theoretic, is illuminated 
in several examples of theoretical or practical importance. 

Index Terms —Observability, ensemble control, tomography, 
moment dynamics, polynomial systems 

I. INTRODUCTION 

T he classical question of observability asks whether it 
is possible to reconstruct the initial state of a finite¬ 
dimensional system via the knowledge of the evolution of the 
outputs, as introduced by R. E. Kalman in [O, see also IIH. The 
concept of observability has become one of the fundamental 
concepts of modem control theory. In this paper we address a 
novel yet very natural extension to this problem in which we 
move from points in finite-dimensional space to probability 
distributions. We ask ourselves under which conditions it is 
possible to reconstmct a non-parametric probability distri¬ 
bution of initial states when given only knowledge of the 
evolution of the probability distribution of outputs. This basic 
and fundamental problem is not only of interest in its own 
right but can also be considered a theoretical foundation for 
state estimation problems for so-called ensembles appearing 
in a variety of different fields. 

In many fields such as process engineering, cell biology, or 
quantum systems, one encounters large populations of systems 
that are governed by the same dynamical process, but which 
have quantities that are distributed among the population, 
and which can only be manipulated or observed as a whole. 
The states of such ensembles are commonly modelled as a 
density function on the state space of the individual systems. 

Section II and III extend results that were presented at the 53rd IEEE 
Conference on Decision and Control, see Q- 
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An example from process engineering are particle systems, 
where for example polymerization processes are modelled 
dynamically with a density function over the particle size that 
evolves dynamically |i4|, [[5). In cellular biology, populations 
of heterogeneous cells are described via a density function 
over the heterogeneous cellular variables, which change dy¬ 
namically due to cellular physiology [HI. The same model 
class is obtained not only by considering populations of 
similar individuals, but also by studying a single system with 
a probabilistic description of model uncertainty. An early 
example is from chemical kinetics, where this problem has 
been termed “stochastic sensitivity analysis” |l7l. 

These ensembles are now also beginning to attract attention 
in the control community, as they define a novel setting for 
classical systems theoretical problems, see e.g. [HI, [19*|. Based 
on their common problem formulation and premise, these 
problems have been grouped into what is now called ensemble 
control. Recent work in ensemble control has focused almost 
exclusively on the control part [fTOl . [fTTI . ifT^ . [fT^ , ITd). 
There the fundamental difficulty stems from the premise 
that all systems in an ensemble receive exactly the same 
control signal. Therefore, the control of such ensembles is 
picturesquely called broadcast control (cf. ca, di) as it can 
be thought of as one command being sent to all individual 
systems at the same time via a broadcast. The ensemble 
observability problem presented here can be considered the 
natural counterpart of these problems. Just as it is not possible 
to manipulate systems in the ensemble individually, one also 
cannot track or observe systems in the ensemble individually. 

An example for such state estimation problem for ensembles 
is given by the state estimation of heterogeneous cell popula¬ 
tions. Due to the experimental circumstances, measurement 
data in the context of cell populations consists mostly of 
population snapshots which are provided by high-throughput 
devices such as flow cytometers, as illustrated in Eigure 



Eig. I. An illustration of population snapshots. In each time step fi,... ,f 4 we 
have a snapshot of certain output values of a population. The crucial point 
is that in a snapshot, information relating an output value to the individual 
producing that output value is completely missing. 








While a population snapshot does contain a vast numbers 
of output measurements, this is at the drawback of losing any 
information relating output measurements to the individuals 
that produced them. Thus, while we get a good idea of how the 
population evolves as a whole, due to the aggregation of data, 
we cannot tell which point corresponds to which individual. 
In particular we do not know how a particular individual 
evolved over time. In fact, state estimation for heterogeneous 
cell populations portrays a rather drastic example for state 
estimation problems for ensembles. This is because measuring 
e.g. protein concentration within a cell often results in killing 
the cell, making it impossible to measure that cell again. 
Measurements given in different snapshots therefore stem 
from different cells within the population. For these reasons, 
the only adequate mathematical model is to view an output 
snapshot as a set of samples taken from the output distribution 
which evolves according to the dynamics of the structural 
system. Idealizing the vast number of samples as distributions, 
this immediately leads to the ensemble observability problem. 

State and parameter estimation for heterogeneous cell popu¬ 
lations from population snapshots has been recently considered 
from an applied point of view [[T7]| . ll^ . ifTSl . While a range of 
numerical methods for solving such inverse problems is known 
(see ifT^ for a recent review), the estimation problem has not 
been characterized from a systems theoretic point of view. For 
example, while solutions produced by ad hoc optimization 
based reconstruction algorithms should fit the measurement 
data well by construction, it is not at all clear what the precise 
relations between the solution of the algorithms and the real 
solution are OH- Neither are there results establishing the 
uniqueness of an estimation result a priori. 

Besides the introduction and motivation of the ensemble 
observability problem, our main contribution of this paper 
is a complete systems theoretic characterization of ensem¬ 
ble observability for linear systems. We apply a measure 
theoretic description of dynamical systems acting on prob¬ 
ability distributions If20l and establish a connection of the 
ensemble observability problem with a classical problem of 
mathematical tomography and integral geometry m, M- 
This is the problem of reconstructing an internal density from 
external Radon projections. In fact, as it turns out, the output 
distributions are Radon projections of the initial distribution. 
Combining results from the field of mathematical tomography 
with observability properties of the classical linear system, 
we obtain sufficient conditions for ensemble observability 
of continuous ensembles of LTI systems. The connection to 
mathematical tomography reveals geometric properties of the 
ensemble observability problem nicely and furthermore allows 
us to transfer the well-developed computational reconstruction 
methods from computed tomography to reconstruction prob¬ 
lems for dynamical ensemble systems. 

While the relation to tomography does provide us with a 
full algebraic characterization of ensemble observability, for a 
given LTI system the condition is hard to check in general. 
Our second approach, which is based on the reconstruction 
of the moments of the distribution, resolves this problem. 
In this approach, the dynamics of the moments of the state 
distribution are governed by systems of homogeneous forms. 


called tensor systems. Although these systems describe the 
dynamics of monomials, the systems themselves are linear. 
Therefore, another sufficient condition for ensemble observ¬ 
ability is the observability of every tensor system. In this 
systems theoretical approach, we can also easily formulate 
a feasible sufficient condition for ensemble observability of 
a specific class of single-output systems. As it turns out, 
the most general condition for ensemble observability is still 
very restrictive compared to that of classical observability. 
Lastly, we establish the equivalence of the tomographic and the 
moment-based approaches, thereby giving a complete picture 
of the problem. 

The organization of this paper is as follows. In Section|n|we 
introduce the ensemble observability problem mathematically 
and elaborate on the setup. We formulate the direct and inverse 
problems on the level of probability distributions and probabil¬ 
ity density functions. By inspecting these formulations from an 
inverse problems perspective, we identify them as tomography 
problems. In Section III we examine the theory of math¬ 
ematical tomography and integral geometry which provides 
results for the uniqueness of the reconstruction problem. We 
furthermore demonstrate how the tomographic reconstruction 
methods can be used for the practical reconstruction of initial 
state distributions. In Section we pursue an alternative 
approach based on reconstructing all the moments of the 
distribution. To this end, we utilize the framework of tensor 
systems, which describe the dynamics of monomials of the 
original state. As the dynamics of the monomial systems are 
linear, the ensemble observability problem can be described 
via the observability of these tensor systems. We establish, 
as one of our main results, the equivalence between the 
tomographic characterization of ensemble observability and 
this more systems theoretic characterization. The appeal of 
having a complete framework originating from two different 
view points is illustrated in several examples. 


IT Ensemble observability 

To capture the essence of those inverse problems described 
in the introduction, we consider an observability problem for 
a finite-dimensional linear time-invariant system 

v(0)=xo, 

y{t)^Cx{t), 

in which the initial state vq is a random vector, that is a 
multivariate random variable, with probability distribution Pq. 

Before going on to discussing this setup mathematically, we 
would like to elaborate more on the setup in view of population 
models and thereby introduce some terminology. In view of 
population models or ensembles, one can think of the setup as 
a description of a continuum of individual systems that have 
the same dynamics and measurement outputs given by O' 
but different initial states. We call 0 the structural system of 
the ensemble, and Pq the initial distribution which accounts 
for the specific heterogeneity of the initial states within the 
population. The fact that the initial state is a random vector 
leads to the fact that the output y{t) at any given time is also 
a random vector. Let IPy(f) denote its distribution, which we 
shall call output distribution. 



Now, analogously to the classical observability problem, 
the ensemble observability problem aims at reconstructing 
the initial distribution Pq from the evolution of the output 
distributions Py(f). A linear system, for which it is possible 
to uniquely reconstruct an initial distribution from the time- 
evolution of the output distribution will be called ensemble 
observable. 

Definition 1 (Ensemble observability of linear systems): 

A linear system is called ensemble observable if 

(Vt > 0 IPy(f)|P^ = JPy(f)|P") ^ JPq = JPq) 

for all continuous initial distributions Pq and Pq. 

It turns out that aiming for a general result for arbitrary 
non-parametric probability distributions is impossible. This is 
why we will need to restrict our attention to more specific 
classes of distributions for our theoretical results later. 

A. The direct problem in a measure theoretic framework 

Clearly, the study of ensemble observability is an inverse 
problem, just as the classical controllability and observability 
problems can be naturally viewed as well, see e.g. Il2^ . Thus, 
before we proceed with addressing the inverse problem, it is 
reasonable to first discuss the direct problem, i.e. how the 
output distribution evolves from the initial distribution under 
the finite-dimensional LTI system O' Having established this, 
we can then investigate to which extent we can use this direct 
problem to go the opposite direction for the inverse problem. 

Since the output distribution P^^^) is by definition the 
probability distribution of the random vector y{t), and since 
the output is related to the initial state via y{t) — Ce^^XQ, the 
distribution Py(f) is simply recognized as the push-forward 
measure of Pq under the mapping x C(fi^x. 



Fig. 2. This figure illustrates the connection between Pq Pv(/)- 
distribution is the push-forward measure of Pq under the mapping Ce^k 

This situation is illustrated in Figure and may be formu¬ 
lated as follows. For a measurable set By c M'” we have 

IP,.„)(B,):=IPo((C/')-‘(B,)), (2) 

that is, to compute the probability Fy(f^{By) for a Borel set 
By G one pulls back By via the mapping x Ce^^x 

to obtain the pre-image {Ce^^)~^{By) which is then measured 
via the probability measure Pq. 

Assuming furthermore that Pq has a probability density 
function po :MT ^ M, we may reformulate 0 as 

J{cA^)-fBy) 

This establishes the direct problem which can now be used to 
address the inverse problem of reconstructing Pq from 


B. The ensemble observability problem as an inverse problem 

Having established the forward relation ([^ between Pq and 
Py(f), and the forward relation ([^ between pQ and Py(f), we 
can already see which information we can infer about the 
inital distribution from the output distribution for the inverse 
problem. Since by assumption we know for any time the 
probability distribution Py(f), we now also know the value 

¥o({Ce^fi~^{By))^ [ podx 

for all t > 0 and all By G which is the probability 

that an initial state xq lies in the set {Ce^^)~^{By). We may 
illustrate the inverse problem as in Figure now. 



Fig. 3. This figure illustrates the problem that is at the core of our inverse 
problem: The reconstruction of an unkown density po from its integrals along 
sets {By) for different t > 0 and By G t^{R"'). 

Each single piece of information on the initial distribution 
obtained through the values /(ce^?) i(b ) Po dx does not provide 
us with a lot of knowledge about po in general, since the output 
matrix C G is in general not injective. Thus in general 
a set {Ce^^)~^{By) stretches to infinity due to the non-trivial 
kernel of the matrix Before studying this inverse problem 
in full generality, we illustrate in an example, how ensemble 
observability is related to observability of the structural system 
mathematically. 

Example 2: We consider the system 

x{t) ^ 0^ ^(0) ^^0, (4) 

see Figure with two different output matrices 

C' = (0 1) and C" = (l O). 

Concerning the output matrices, we notice that the first output 
matrix C' leads to an unobservable structural system, while C" 
renders the structural system observable. 

Returning to our inverse problem, we had already translated 
the ensemble observability problem to the inverse problem 
given by ([^. It is apparent that the properties of the kernels 

kerCe^^ = e“'^^(kerC) 

will play a crucial role in the reconstruction. In this example 
we have kerC' = span((l O)) for the first output matrix, 
which leads to the fact that the sets (C'(By), for a 
given measurable set By c M, are horizontal strips. But by 
only having at hand integrals over strips that have the same 
“orientation”, one cannot uniquely reconstruct po- 











Xi 


Fig. 4. The phase portrait of the system x{t) =Ax(t) given in 0 - 


This is because one can take an arbitrary pQ and shift it 
along the xi-axis, leaving the resulting output distributions 
unchanged. More generally, the idea is that due to unobserv¬ 
ability, one clearly ends up with a non-trivial intersection 

f| kerC/' = {xo e M" : C/'vq = 0}. (5) 

f>0 

Taking an abritrary density po we may shift this density along 
a non-zero vector taken out of the intersection. We end up 
with two different densities that give the same value when 
integrated over sets {Ce^^)~^{By). A proof of the general 
situation will be given in the next subsection. 

It is interesting to see now what will happen in the ob¬ 
servable case. There, we find that kerC" = span((0 l)). The 
evolution of e“'^^(kerC'') shows that the kernels are now tilted, 
which is also to be expected in virtue of a trivial intersection 
0- This is illustrated in Figure Thus, by considering 
measurements of the output distribution at different time 
points, we now obtain integrals of po along strips at different 
angles, thus gaining much more information about the initial 
density po than in the unobservable case. Yet, it is noted 
that the available range of angles is still constrained by the 
observability properties of the system. The intriguing question 
is, whether or not these different pieces of partial information 
can be put together so as to get full information on pQ. 



Xi 


Fig. 5. The evolution of the kernel kerC''e^' is given by transporting the 
kernel kerC'' with the flow of x{t) = Ax{t) back in time. This results in a 
tilting of the kernel kerC'', and their intersection is trivial, corresponding to 
{A,C'') being observable. The color intensities of the kernels indicate their 
advancement in time. 


At this point, readers familiar with such inverse problems 
may have already identified our problem as a tomography 
problem. To further highlight this, we reformulate the inverse 
problem ([^ just one last time. From the coarea formula it 
follows that the integral in ([^ is equal to 

f ( ^ = [ PO dy, 

provided that the matrix C is not full rank. Here the inner 
integration with respect to d^ denotes a surface integral over 
the affine subspaces defined by the equation Ce^^x — y. It is 
readily recognized that the density of the output is then 

Intuitively, passing from the integral ([^ to the surface integral 
0 can be thought of as a concentration of the information 
)Po dv about po by taking the “width” of a “strip” 
as illustrated in Figure to zero. 

To recap, the ensemble observability problem is now recast 
as the problem of reconstructing a density from its integrals 
over affine subspaces defined by the equation Ce^^x — y. Such 
problems of reconstructing a density from integrals over affine 
subspaces, or more generally manifolds, are unique to the 
theory of mathematical tomography and integral geometry. 
Our first approach is based on this natural connection. Before 
we proceed with studying this problem in the tomographic 
framework, we would like to give several remarks and fur¬ 
thermore give a necessary condition based on the findings of 
the illustrative example. 

Remark 1 (The case of full state measurements): The 
formulation 0 is restricted to matrices C that are not full 
rank. The case in which C is invertible, however, is trivial. 
There it suffices already to know the output distribution at 
only one instance > 0. To see this, assume without loss of 
generality that C — I such that ([^ specializes to 

To compute for a measureable set B G the probability 

Po(5), we just compute B) since 

=Po(c-'’'*(/'*B)) =Po(B) 

due to the invertibility of the matrix exponential. 

Another explanation is via the advection equation or Liou- 
ville equation 

= -div(p(t,x)F(x)), 

which is a partial differental equation that describes the 
evolution of a density v p{t,x) in a vector field x^-A F{x). 
The important fact to recall here is that the properties of the 
flow of the vector field are inherited so that the evolution of 
the density can be described by a flow as well, cf. Frobenius- 
Perron operators or transfer operators ll^ . It is noted however, 
that the measure theoretic explanation is more elementary and 
that the advection equation in fact originates (mathematically) 
from the measure theoretic framework, cf. Il20ll . 


























Remark 2: Having mentioned the Liouville equation, we 
further note that it is a linear partial differential equation, 
which suggests that one may study this problem in an 
infinite-dimensional linear systems framework Ii24l . In contrast 
to the typical output measurements considered in infinite¬ 
dimensional linear systems theory, our output is given by 


Py{t)iy) 


^det(CCT) ic-i(M) 


p{t,x) d^, 


which is still a surface integral, now taken over the propagated 
state density. Such integrals are typically not considered in 
infinite-dimensional linear systems theory, but are in fact a 
hallmark of tomography problems. Due to this fact, existing 
results in infinite-dimensional linear systems theory cannot be 
directly applied to our problem. 

Remark 3 (Incorporation of inputs): Notice that although 
we formulated the observability problem only for systems 0 
without input, our results do hold for linear systems with input 
as well. Given 


x(t) — Ax(t) + Bu(t)^ v(0)=xo, 
y(t) =Cv(t), 


with XQ a random vector and a known input function t ^u{t), 
we can easily deal with the input by considering this in the pre¬ 
images of the new mapping x^ Ce^^x-\-'^'^Bu(T)dT, 
which would not alter the analysis at all. Another way to 
deal with this is by canceling the convolution integral out 
by redefining the output. Here it is important that both the 
input matrix and the input signal be identical for all individuals 
among the population, so that fQCe^^^~'^^Bu{T)dT is the same 
for all individuals. This setup thus describes a population that 
is steered by one single signal in a broadcast manner, which 
is also the fundamental premise in ensemble control. 


C. Observability of structural system is necessary 

In this subsection, we present a first theoretical result 
concerning the necessity of observability of the structural 
system for the corresponding ensemble observability problem 
to admit a unique solution. This result is a straightforward 
generalization of our findings in the illustrative example. 

Theorem 3 (Necessary condition 4251/ . ^ ): Observability 
of (A,C) is a necessary condition for ensemble observability. 

Proof: We show that under the assumption that (A,C) is 
not observable, there exist initial densities pf pf' for which 


/ p[)dv= / po'ck 


(Ce^^-^iBy) 


for all t > 0 and By G In other words, we construct two 

distinct initial densities Pq and Pq which are indistinguishable 
from the output distributions. To this end, we fix an arbitrary 
probability density function pf. 

Since (A,C) is not observable, the intersection ([^, which 
is the unobservable subspace, is non-trivial. Therefore we can 
pick out of this unobservable subspace a non-zero vector v, 
and given that define a second probability density function by 


Po{x) :^pd{x-\-v). 


Clearly these two densities are distinct. Furthermore we have 


'iCe^n-HBy 


pd'{x) dx 


'{Ce^n-fBy) 


pd (x -f v) (k 
pd (x) dx 


for all t > 0 and B^ G 


Jv+{Ce^^)-\By) 

). Lastly, we observe that 




since v G kerCe^^ for all t > 0. This yields the claim. ■ 
This result generalizes our finding in the illustrative example 
where an unobservable structural system lead to the exis¬ 
tence of indistinguishable initial distributions in the ensemble 
observability problem. For the observable structural system 
in the example on the other hand, it seemed plausible that 
knowing integrals of the unknown density along infinitely 
many directions could be sufficient for ensemble observability. 
Yet, the situation is not as clear as for the necessary condition. 
In the next section, we work towards a resolution to this 
problem based on the connection to integral geometry and 
mathematical tomography. 


HI. Solution via mathematical tomography 

In this section we first give a brief background on the theory 
of mathematical tomography and afterwards introduce the 
mathematical framework. Given this framework, we proceed 
towards a solution of the ensemble observability problem. 


A. Background on mathematical tomography 

Classical tomography can be described as a way to deter¬ 
mine the internal structure of an object without having to open 
it up. Probably the best known example for a tomographic 
problem is computed tomography. Computed tomography is 
used for providing cross-sections of e.g. a part of a body for 
medical diagnosis and is based on the physical properties of an 
X-Ray beam. Let an X-Ray beam L passing through an object 
with density f{x) be parameterized by t, then the intensity I{t) 
along the X-Ray beam is attenuated according to the Beer- 
Lambert law ll^ 

3l(t) = -f{L(t))I{t). 

By virtue of this law, measuring the intensity I\ — 
of the beam after it went through the object and comparing it 
with the intensity /q at which it was emitted, we can compute 
the value 

/^/WdS = l0g(J). 

A. M. Cormack, one of the inventors of computed tomography 
identified this mathematical problem of reconstructing / from 
its line integrals fNT\ , ll^ . He proposed a reconstruction 
method for which he was awarded the Nobel prize in medicine 
and physiology in 1969 jointly with G. N. Hounsfield. Only 
later was it discovered that the mathematical problem was 
solved already 50 years earlier by mathematician J. Radon 
ll^ . for purely mathematical reasons. 




The general problem of tomography is the reconstruction of 
a function from its Radon transform f2^ . which in its classic 
form is a transformation that maps a two-dimensional scalar 
function / to the transform Rf which is defined on lines L, i.e. 
Rf{L) — Jif{x) d^. This is generalized to the n-dimensional 
case as follows, cf. 1221. 

Definition 4 (Radon transform): The Radon transform 
maps an integrable function / G L^(M",]R) to its transform 
Rf : X M — > M which is given as 


Rfico,p)^ f f{x)dS 

J {jcgK” : {(OyK)=p} 


whenever the integral exists. Furthermore the function R®/ 
given by {R(of){p) — Rf{co,p) is called Radon projection 
along (0^, or Radon projection orthogonal to co. 

At this point let us note the immediate connection to our 
problem ([^. We recognize that the output densities are nothing 
but some kind of Radon projections of the initial density 
along kerCe^k Thus, the inverse problem of reconstructing the 
initial density from the output densities is clearly a tomography 
problem. 

Remark 4: So far we only introduced the classical transform 
for hyperplanes, while the dimension of kerCe^^ need not be 
n—\. Before presenting results in full generality however, we 
would like to further illustrate theoretical reconstruction results 
only for the Radon transform defined for hyperplanes. This is 
for brevity of presentation as the mathematical framework for 
transforms for subspaces of arbitrary dimension becomes more 
involved. Luckily, in our probabilistic framework we may later 
take a slightly different route for the general problem, leading 
to a much shorter derivation. 

Let us now turn towards the solution of the reconstruction of 
a density from its Radon transforms. The key to the theoretical 
inversion of the Radon transform lies in its connection to 
the Fourier transform. If we define the n-dimensional Fourier 
transform as 


JR" 


then, the connection is given as follows. 

Theorem 5 (Projection Slice Theorem, cf i l22l/ ).' Consider an 
integrable function / G L^(]R",]R) and let O) be a unit vector. 
Then we have 


That is, the one-dimensional Fourier transform of the Radon 
projection along (O^ is equal to the n-dimensional Fourier 
transform of the density restricted to the “slice” O(0. As 
for the Fourier transform, we know that it is a bijection. 
Therefore, if we have for an integrable function, the Radon 
projections for all “directions” (O, then we know the n- 
dimensional Fourier transform of /. By bijectivity, we know / 
and are done. This is the classical solution to the tomography 
problem. Unfortunately this result does not apply directly 
to our problem. In contrast to the problem in computed 
tomography, we may not freely choose the directions at which 
we can gather Radon projections, but the directions kerCe^^ 
are inherently determined by the observability properties of 
the finite-dimensional system as was illustrated in Example 


B. Sufficient conditions for ensemble observability 

For the sufficient condition we draw to the mathematical 
tomography approach introduced before. To deal with the 
general case of subspaces of arbitrary dimension, and the 
limited direction problem, we turn towards a probabilistic 
approach to the tomography problem. 

First of all, we would like to reformulate the Projection Slice 
Theorem in the probabilistic framework. This probabilistic 
analogue is known as the Cramer-Wold device in probability 
theory ll^ . 

Theorem 6 (Cramer-Wold theorem): A distribution of a 
random vector X in M” is uniquely determined by the family 
of its push-forward distributions under the linear functionals 
X (v,x), where v G is a unit vector. 

Proof: The first step is to relate the characteristic function 
of the distributions of {v,X) to that of X via the simple 
calculation 

= £«'<"■■''> = (px(sv). 

Since the left-hand side is given for all v G and all 5 G M, 
by the above identity we know the characteristic function of 
X, and thus the distribution of X. ■ 

Remark 5: To see that this is in fact a probabilistic analogue 
of the Projection Slice Theorem, we observe that the left-hand 
side is simply the Fourier transform (modulo i i—> —i) of the 
density of viXi -f... whereas the characteristic function 
on the right-hand side is the Fourier transform of the joint 
density. The density of viXi -f... v„X„ on the other hand is 
nothing but the Radon projection orthogonal to v. 

It is interesting to note that before the connection between 
tomography problems and its probabilistic counterpart was 
pointed out in ED, cf. i22\ . the developments in both fields 
happened independently. The Cramer-Wold device is used in 
probability theory mostly as a conceptional tool, to be more 
precise, it is used as a way to reduce a high-dimensional prob¬ 
lem to a one-dimensional problem to which one can then apply 
well-established results. Our use of this result here is slightly 
different as we exploit its analogy to tomography problems 
explicitly to tackle the inverse problem of reconstructing the 
initial density from the output density. 

For our inverse problem we compute the characteristic 
function of the output distribution to find 

= = (pxfi(Ce^fi^s). 

That is, the output distributions yield information on the 
characteristic function of the initial state distribution on the 
subspaces Im(Ce^^)^ = (kexCe^fi^. This simple insight will 
be key in formulating characterizations for the uniqueness of 
the density reconstruction problem. We note however, that 
from Q we cannot in general gather information about the 
whole characteristic function due to the fact that (Ce^fit>Q 
is clearly parameterized by the scalar t > 0. This is exactly 
where we need to draw on analyticity properties of the char¬ 
acteristic function. In mathematical tomography, analyticity of 
the Fourier transform is typically guaranteed by the standard 
assumption of bounded support of the considered densities. 


For the ensemble observability problem, the assumption of 
bounded support would exclude e.g. Gaussian distributions. 
We show that the assumption of bounded support can be 
replaced by a more general assumption for the characteris¬ 
tic function (px^. We assume that for the considered initial 
distributions Pq, the mapping 5 1 —> (Pxq{sv) — <P(v,Xo)(‘^)’ all 
non-zero v G M", is real analytic, i.e. can be locally written 
as a power series about every point in M. The role of this 
assumption will be further illuminated in the moment-based 
approach in the next section. 

We begin by formulating our main result of this section, 
which gives a first sufficient condition for ensemble observ¬ 
ability with respect to a specific class of initial distributions. 
Our result is inspired by uniqueness results for the tomo¬ 
graphic reconstruction problem, cf. Theorem 5.2 in ll^ and 
Theorem 3.142 in ll2^ . which give the most relaxed charac¬ 
terization known in the mathematical tomography literature. 

Theorem 7: A linear system (A,C) is ensemble observable 
for the class of initial distributions for which s 1 —^ (px^isv), for 
all non-zero v G M”, is real analytic, if 

IJ (ker ^ = IJ Im(C/') ^ (8) 

t>0 t>0 

is not contained in a proper algebraic subvariety of M". 

Thus, a sufficient condition is that the directions generated 
by 1 1 —^ Ce^^ are rich in the sense that we cannot find a proper 
algebraic variety in which the union ([^ is contained in. Recall 
that an algebraic variety of M" is the zero set of a polynomial, 
and that it is proper if it is not M". 

Proof: We show that under the analyticity condition on 
(jPxo and the assumption that the union ([^ is not contained in a 
proper algebraic variety, knowing the characteristic function on 
is sufficient to know the characteristic function everywhere. 

First of all, we consider two (px^ and (px^ such that their 
difference h cpx'^ — <Px" vanishes on the union ([^, i.e. 

= 0 for all ^ e lJlm(Ce^')^- (9) 

f>0 

By analyticity, we can write for any non-zero G M" and any 
sufficiently small A, 


hm) 


p=Q 


( 10 ) 


Therein ap{^) = ^ (E((<^,X^)^) -E((<^,X^y)), which is a 
homogeneous polynomial of degree p, see Section IV-A| 

Now for an arbitrary ^ G lJr>oIna(Ce^^)^, by analyticity, 
the condition h{X^) — 0, for all A in a neighborhood around 
the origin, is equivalent to the vanishing of the polynomials 
Up on the union The union f) is thus contained in the 
algebraic varieties defined by Up. Thus, by the assumption that 
the union is not contained in a proper algebraic variety, 
all polynomials must be trivial, i.e. Op = 0. 

Since for all non-zero ^ G M", the mapping A 1 —> /z(A<^) is 
real analytic in a neighborhood of any point of the real axis, 
A H-> h{X^) is completely determined by its power series about 
the origin, which is zero. Therefore we conclude that /z = 0, 
i.e. cpx' — cpx", and thus lastly Xq —Xf. ■ 


From the main result Theorem |7| we now further derive 
a sufficient condition based on the special case that occurs 
when the affine subspaces that one is integrating over are 
one-dimensional. This is nowadays considered a classic result 
arising from the study of the X-Ray transform In view of 
the ensemble observability problem, the assumptions are quite 
restrictive though. 

Theorem 8: If (A,C) is observable, and rank C — n—\, then 
the union ([^ is not contained in a proper algebraic variety. 

Proof: With rank C — n—\, the dimension of (kerCe^^)-^ 
is also n—l. Due to the observability of (A,C), the intersec¬ 
tion 0 is trivial and thus (kerCe^^)-^ with t > 0, constitutes 
an infinite family of pairwise distinct hyperplanes. More in 
detail, for an observable system (A,C), the mapping 1 1 —> Ce^^ 
cannot have a discrete image • • •} for arbitrary 

(at most) countable times • • • • To see this, define the sets 

21 {t > 0 : Ce^^ = 

Due to continuity of 1 1 —^ Ce^f the sets Ty- are closed. Suppose 
now for contradiction that Ua:=i 2 21 = [0,oo), then the Baire 

category theorem yields the existence of an index k* so that 21* 
has non-empty interior, i.e. contains an interval. But this would 
contradict observability of the system during this interval. 
Lastly, an infinite family of distinct hyperplanes cannot be 
contained in a proper algebraic variety. ■ 

Furthermore we would like to highlight the remarkable 
special case of n = 2, in which the richness property is satisfied 
by observability of (A,C) alone. 

Corollary 9: For an observable two-dimensional system 
(A,C), the union ([8]) is not contained in a proper algebraic 
variety. 

One question that immediately comes up is whether or not 
an observable system (A,C) already generates “directions” rich 
enough such that the union ([8j is not contained in an algebraic 
subvariety. The following example answers this question to the 
negative. 

Example 10: Recall that from Theorem and Corollary 
we learned that in order to find a system that is observable, 
but for which the union ([^ is contained in a proper algebraic 
variety, we need to consider systems with degree of at least 
three. Consider the system 

/O 0 

x{t) = j 0 “1 

\0 0 
y(t) = (l 1 l)v(t), 

which is easily seen to be observable in the classical sense 
since the diagonal entries are pairwise distinct and every entry 
in the output matrix is non-zero. Now if we compute 

we see that the algebraic variety given by the homogeneous 
polynomial equation 

( 12 ) 

contains the union ([^, thus violating the richness condition in 
Theorem |7l 



( 11 ) 





Note that, since Theorem is only sufficient, we can not 
conclude at this point that the system GD is not ensemble 
observable. We will come back to this example later, after 
having introduced the moment-based approach to ensemble 
observability. There we will see that we can actually construct 
distinct, indistinguishable (Gaussian) initial distributions for 
this example. Before that, in the next subsection we show 
how the well-developed computational reconstruction methods 
from computed tomography can be used for the practical 
reconstruction of initial state distributions. 

C. Practical reconstruction based on tomography methods 

In this subsection, we demonstrate that the tomography 
framework can be also used as a means to practically recon¬ 
struct an unknown initial distribution. In particular, we also 
give a computational solution to the ensemble observability 
problem in Example 

40) 

y(0= (1 0)x(0. 

Suppose that the unknown initial density is given by the 
bimodal density 

po = 0.7pi -f 0.3p2 

where p\ is the density of a normal distribution with mean 
jUi = (1,2) and covariance matrix Li = diag(0.3^,0.3^) and p 2 
is the density of a normal distribution with mean /i 2 = (2,1) 
and covariance matrix E 2 = diag(0.2^,0.2^). 

For the tomographic reconstruction of the initial density, 
we can employ Algebraic Reconstruction Techniques (ART), 
see e.g. El These reconstruction techniques are based on 
the idea of discretizing the state space into pixels so that 
the unknown distribution is expressed as a piecewise constant 
function that is constant on a fixed pixel. The strip integrals 
/(Ce^q 7*0^ occuring in the tomography problem are thus 
approximated by weighted sums of the values of the pixels that 
the strip passes through. The values Fy(^f^{By) are approximated 
via measured samples (10^ samples for each time point in this 
example) from the output distribution. This results in a large 
system of linear equations which is then solved iteratively 
using e.g. Kaczmarz method. The result obtained by ART is 
shown in Figure for different iteration numbers. For smaller 
numbers of iterations, we witness a well-known “distortion” 
effect which is due to the limited direction situation, cf. 
Figure The drawback of increasing the number of iterations 
is the numerical noise that gets introduced. 

Remark 6: While Theorem [8] combined with Theorem [7] 
explicitly requires measurement data of the output densities 
for infinitely many time points, it is clear that in practice one 
can only use a finite number of measured output densities. 
In other words, our results do not apply directly. To make 
matters worse, one can even show that there are infinitely 
many different densities that produce a finite set of Radon 
projections exactly, see e.g. Il22l . Il33l . Nevertheless we can 
expect the practical reconstruction to yield a small estimation 
error for sufficiently many measured output densities. 


In conclusion, while the established methods of tomography 
seem to be generally suited for the ensemble state recon¬ 
struction problem as introduced here, there are still specific 
challenges stemming from the dynamic origin of this problem. 
However, an exhaustive discussion of different tomographic 
methods for the ensemble state reconstruction problem is 
beyond the scope of this paper. 



x^(0) x^(0) 



x^(0) x^(0) 


Fig. 6. The reconstruction of the bimodal distribution using ART with 
different iteration numbers 1, 3, 5 and 7. For a smaller number of iterations, 
a “distortion” effect is witnessed. This is known to be caused by the limited 
direction situation. By increasing the numbers of iterations, however, this 
effect is in large parts suppressed despite the limited direction situation. 


IV. Solution via observability of moments 

In this section we present an alternative, dual, route to 
the ensemble observability problem based on the idea of 
directly reconstructing the moments of the initial density. The 
dynamics of the moments can be described by so-called tensor 
systems, which are systems of homogeneous p-forms in the 
components of the state x as considered by R. W. Brockett in 
li34l and thereafter extensively studied within the observability 
of linear systems with polynomial output Ii35l . Ii3^ . IITtII . [f38l . 

The problem of reconstructing the moments of the density 
then boils down to studying the observability of all these tensor 
systems which happen to be linear systems. The assumption of 
real analyticity of the characteristic function of the unknown 
density from our previous approach does in fact guarantee 
that moments of all orders exist and that these moments 
furthermore determine the density uniquely. We show as one of 
our main results, that both frameworks are inherently related, 
but that the moment-based approach in fact allows for even 
more general results. Having established this link, we continue 
with studying the ensemble observability problem in this more 
systems theoretic framework. 






















































































A. The moment problem 

We start with a brief introduction to moments of multivariate 
probability distributions and the famous moment problem. 
Let X be an n-dimensional random vector with a probability 
distribution P. For a multi-index 


(X — (OJl) tX2 ) • • • ) (Xn )) 


i.e. an n-tuple of non-negative integers, we call 

dP(x), 

Jw 

a moment of order \a \ — ai a 2 -\ - \-(Xn—p of P, if 


Mp-.^ 



|v||^dP(x) < oo. 


A classical problem in probability theory is the question of 
whether or not a probability distribution is uniquely deter¬ 
mined by its moments. This is known as the moment problem. 
If a distribution with finite moments is determined uniquely 
by its moments, the distribution is called moment-determinate. 

Of course, under the assumption that a probability distri¬ 
bution is moment-determinate, our problem of reconstructing 
the initial state distribution is equivalent to the reconstruction 
of the moments of the initial state distribution. The close con¬ 
nection to our approach in the previous section is given by the 
observation that the mapping (px (^v) is the characteristic 
function of the random variable (v,X). Real analyticity of 
(Piypt) on M implies that (P{y,x) is completely determined by 
its power series about the origin. 


^{v,X){s) 


p=Q 


pi 


cf. Section XV.4 in tf^ . Furthermore, we have the well-known 
fact that the derivatives at the origin are given by 




i.e. (P{v,x), due to analyticity, is uniquely determined by the 
moments E{{v,X)p). These moments can be readily computed 
via the multinomial theorem as 


E((v.xn= £ 

\a\=p V«/ 

cf. the homogeneous polynomial Op in ( fTO] ). Now for two 
distributions P' and P'' with the same moments, all their one¬ 
dimensional projections for non-zero v G M” have the same 
moments, and are thus equal. By virtue of the Cramer-Wold 
theorem, we have P' = P'', i.e. moment-determinacy of the 
initial distributions. 

Besides the richness condition, the approach in the previous 
section is essentially based on moment-determinacy of the 
one-dimensional projections (v,Xo) and eventually moment- 
determinacy of Xq, which are guaranteed by the real analyticity 
assumption. In this section, we pursue a more direct approach 
which aims at directly reconstructing the moments of the initial 
distributions without having to go the route over analyticity 
arguments. In the following, we introduce the framework of 
so-called tensor systems in which the moment dynamics can 
be conveniently described. 


B. Background on tensor systems 

We begin by giving a brief review of tensor systems. This 
will also fix notation for the subsequent analysis. For a more 
complete introduction to tensor systems we refer to li^ . 
Recall that for x G M", the vector denotes the vector of 
weighted p-forms in the components of x, i.e. 



By a standard combinatorial “stars and bars” argument, we 
conclude that the dimension of is N{n,p) 

More precisely, x^^l is the vector of weighted powers x“ with 
I ct I = p, where the entries of x^p^ are ordered lexicographically 
in a decreasing order according to the multi-indices which we 
denote . In this notation, we would write 









T 


where Wp{a) '\/p!/(ai!...«„!). 

Now, if we have an equation y — Cx, then it can be seen 
that there is also a linear dependency between the vectors y^P^ 
and x^P^ which we denote by y^P^ — C^p^x^p\ If we consider 
a linear differential equation x{t) —Ax{t), then it can also be 
seen that x^p^ (?) satisfies a linear differential equation in which 
we denote the system matrix as i.e. x^P\t) — A[p^x^P\t). 
Moreover, observe that the so-called tensor system 

has the solution 

y^\t) = (Ce^'xo)[^] = {Ce^^)^P^x[P\ 


Lastly, observe that by considering the transformation 

/wp{a^) \ 

Jp] ._ . Jp] 

A ,— • ^ 5 

V Wp{a^^^^P^)) 


(14) 


between weighted p-forms x^^ and unweighted p-forms 
observability of the LTI system for the unweighted p-forms 
is not altered. More precisely, by taking expectations in the 
dynamics of the unweighted p-forms, we obtain explicitly the 
dynamics of the moments of order p. 


E[y^P^]^dP^WE[x^P^], 


where W denotes the transformation matrix in ([T^. 


C. Bridging the tomography- and moment-based approaches 

In this subsection we present a bridge between the 
tomography-based approach with moment-based approaches 
based on tensor systems. We thereby establish a complete 
solution with two different but dual view points. The main 
result in this subsection is an explicit connection between 
the richness condition in Theorem and the unobservable 
subspace of a tensor system. 






Lemma 11: The union lJf>o is contained in the 

algebraic variety defined by 

= 0 

if and only if the coefficient vector a is contained in the 
unobservable subspace of 

Proof: The condition that the union lJf>o Ini(Ce^^)^ is 
contained in the algebraic variety given by of' = 0 is 
equivalent to 

for all r > 0 and z G W”. Using {AB)^^ —A^p^B^p^ and 

(aT)W = (a[^])T, (15) 

(see e.g. EH) we arrive at 

aT(Ce^^)WT^b] ^ {{Ce^^)ip]ayz^P^ = 0, (16) 

for all r > 0 and z G 

Now we recall that for a vector v G the fact that 

^Tzb] _ Q 

for all z £ M" is equivalent to v = 0; the vanishing of a 
polynomial for all values of its variables is equivalent to 
the vanishing of all coefficients. Thus, 0 is equivalent 
to a G ker((Ce^^)b]) = ker(Cb]e'^b]^) for all r > 0. In other 
words, the coefficient vector a is contained in the unobservable 
subspace of the tensor system 0. which yields the claim. ■ 

Remark 7: The reason why weights were introduced in the 
beginning is exactly to guarantee that ( [TS] ) is true, cf. [|34). This 
allows us to obtain a direct connection between coefficient 
vectors of the algebraic variety and unobservable states of the 
tensor system: Given a vector of the unobservable subspace of 
a tensor system, this very same vector is also the coefficient 
vector of a homogeneous polynomial that defines an algebraic 
variety in which the union ([^ is contained in. 

Finally, we can state our main result which gives a unifying, 
and also more general, sufficient condition for ensemble 
observability based on the observability of the tensor systems. 

Theorem 12: The union lJf>o Ini(Ce^^)^ is not contained in 
a proper algebraic variety if and only if the systems 

>!(/) =A[,|x['’l(r) 

are observable for all p G N. Under these equivalent conditions, 
the system (A,C) is ensemble observable for the class of 
moment-determinate initial distributions. 

To conclude, our main result Theorem [T^ shows that the 
tomography approach is in fact in perfect accordance with 
this moment approach. In fact, through the systems theoretic 
approach we learn that in Theorem moment-determinacy 
of the considered initial state distributions alone is sufficient, 
i.e. that the stronger assumption of real analyticity was in fact a 
technical assumption for Section|^ One reason for this is that 
the idea of the moment-based approach is more direct: Given 
the output distributions, we can compute their moments and 
then reconstruct the moments of the initial state distribution 
by virtue of observability of the tensor systems. 


In the following two examples, we illustrate the concepts 
of our theoretical framework introduced so far on the linear 


system considered in Example 10 


Example 13: We reconsider system 0. We begin with 
illustrating the result given in Lemma [TT] First of all, we have 
for X G and p — 2 the state of the second order tensor 
system 


X^^^ — (x^ \/2 viX2 \l2x\Xi X 2 \/2v2X3 X 3 ) 


Thus, for 

a=(0 0 1 0 0 )" 

the equation a^xt^l = 0 defines the same variety as 0. 
Furthermore it follows from Ce^^ — (l ^ ^ e and the 
definition of {Ce^^Y^^ via {Cc^^xq)^'^^ — {Ce^^)^^^XQ^ that 

(Ce^f)[2] = (l y/2e-^ V2e-^^ . 


It is quickly verified that 

i.e. that Lemma m holds true. 

In the second example, we illuminate the meaning of the 
algebraic variety defined by 0 to the ensemble observability 
problem. In particular, we show that there exist observable 
systems (A,C) which are not ensemble observable for the class 
of moment-determinate initial distributions. 

Corollary 14: There are systems (A,C) which are observable 
in the classical sense, but are not ensemble observable for the 
class of moment-determinate initial distributions. 

Example 15: We consider again the observable system 0 
and illustrate what interpretation the non-observability of the 
second order tensor system has in terms of moments, as well 
as the consequences for the ensemble observability problem. 
First of all, recall that we can switch between weighted and 
unweighted tensor vectors via the change of coordinates 0 
given by 

:= diag(l, -\/2, 1 ,a/2 , 1) xlfl 


We recall that if we take expectations in the dynamics of the 
unweighted monomials x\f\ we get exactly the dynamics of 
the second order moments which is governed by the very same 
LTI system. Thus the transformed coordinate vector 

a„=(0 0 1 0 0)^ 

is contained in the unobservable subspace of the unweighted 
tensor system, i.e. the system describing the second order 
moments. This means that if we add in the covariance matrix 

/ E[Xf] E[XiX2] E[XiX3]\ 

Fxo = E[XiX2] E[X|] E[X2X3] -ppJ 

Ve[XiX3] E[X2X3] E[X^] J 

a sufficiently small A G M to the (2,2) element and — f to the 
(1,3) and (3,1) elements respectively such that the resulting 
matrix Fxq stays positive definite, then this will not be noticed 
in the output E[y2(^)]. 



In the special case that we consider Gaussian distributions, 
we can construct two distinct but indistinguishable initial 
distributions, disproving ensemble observability of system 
Q. For a concrete example of such indistinguishable ini¬ 
tial state distributions, we can consider Pq = and 


with 

/ct2 \ 



0 


1 '= Ct2 

2 :" = 

0 

2ct2 

0 

V 



0 

Ct2 ) 


i.e. we chose A = cr^. It is readily verified that for this choice, 
l!' is positive definite (e.g. via Gershgorin circle theorem). 

Now for system 0. we recall that = ( 1 e ^ e 

which we use for computing the covariance of the output via 
the well-known equation 

<j;,„ = (C^')Zx„{Ce^Y. (17) 

It is readily verified via ( fTT] ) that both covariance matrices Z' 
and Z" lead to a variance of 

(l + g-^' + e-^OcT^ 

of the output distribution. Since the output distribution is also 
a normal distribution, it is uniquely determined by its mean 
and variance, which are both the same for P'^^^ and by 
construction. Thus, system G3 which is observable in the 
classical sense is not ensemble observable, and in particular 
also gives an example for Corollary 

Note that it is not immediately clear in general that the non¬ 
observability of a particular moment leads to a non-uniqueness 
of the reconstruction of the initial state distribution. This is 
because we may not be able to simply add an element of 
the unobservable subspace of the moment to some solution so 
that we have come up with a moment sequence of a different 
distribution that has the same output distributions. Rather, it is 
well-known that moments further have to satisfy additional 
conditions such as positive definiteness of the covariance 
matrix mentioned in the example. This obstructs the way 
for a non-uniqueness argument which is based on linearity, 
and shows yet again that the problem we are studying is not 
fully linear. The nonlinearity in this seemingly linear problem 
is naturally introduced by the consideration of probability 
distributions. 

D. A feasible condition for specific single-output systems 

In this subsection we derive a more practical condition 
on the matrices A and C such that for all p G N the pth 
order tensor systems are observable. Clearly, checking the 
observability of infinitely many tensor systems a priori is 
impossible. In the following we derive a feasible necessary and 
sufficient condition for the observability of all tensor systems 
for a specific class of systems. We focus here on the class of 
observable single-output systems in which the system matrix 
has distinct real eigenvalues. The resulting condition turns out 
to be very restrictive, which suggests yet again that the class 
of ensemble observable systems must be much smaller than 
the class of observable systems. 


Theorem 16: Consider an observable single-output system, 
where the system matrix A has distinct real eigenvalues 
Ai,...,A„. Then the respective tensor systems are observable 
for all orders, if and only if the differences 

A,2 ) • • •) Ajj A]^ 

are linearly independent over Q. 

Proof: Recall that any observable single-output system 
with a system matrix having distinct real eigenvalues can be 
transformed so that 

A = diag(Ai,...,A„) 

with distinct real values A, on the diagonal, and 
C = (ci ... c„) 

where every scalar entry c/ is non-zero, are the new system 
and output matrix respectively. 

It is not hard to see that the matrix is then also a 
diagonal matrix, for which the entries on the diagonal are sums 
of the form 

OJiAi -f Ot2^2 + • • • + Otn^m 

where ct is a multi-index of order p, i.e. cti H- \-0Cn—p. 

Now observe that if some diagonal entries of Aj^j have the 
same value A, then in view of a Hautus test for the pth order 
tensor system, the difference Aj^j — A/ will have a rank loss 
that is greater than one. The output matrix being a row 
vector however can only accomodate for exactly one rank loss. 
The question of asking whether there exists p G N so that 

«'A/ = ^ a'/hi 

i=l i=l 

for different multi-indices a' and a" of order p, can be seen 
to be equivalent to the question of asking whether there exists 
a non-zero vector z — {zi ... Zn) of integers such that 

zi H- \-Zn = 0 and ziAi H-hz«A„ = 0. 

This is, as one can show, equivalent to A 2 — Ai,..., A„ — Ai 
being linearly independent over Q. ■ 

The condition that real distinct eigenvalues now further have 
to satisfy a linear independence over the rational numbers was 
also reported in e.g. If^ . within the study of Carleman 
linearizations. 

E. Incorporating independence of initial state components 

In this subsection we examine what is to be gained when one 
has knowledge about the components of Xq being independent. 
As before, our standing assumption is moment-determinacy 
of the considered probability distributions. To recap, for these 
distributions, the ensemble observability problem is equivalent 
to the reconstruction of all moments. Moreover, we assume 
now that the components of the random initial state Xq are 
independent, which can be formulated as follows. 

Assumption 1: The components Xq^/ of the random vector 
Xo are independent, or in other words, the initial state density 
is decomposable as po(v) = RLi PoA^d- 




The consideration of this assumption is relevant for practical 
problems in which initial state distributions do satisfy such 
independence assumption. Furthermore, the following analysis 
is a nice application of our results and shows what exactly is to 
be gained from the independence assumption mathematically. 
Such independence assumption has not been considered so far 
in the tomography literature. 

The first trick in our consideration is to consider cumulants 
rather than moments. It should be kept in mind that if moments 
exist, then cumulants do exist and that one then can directly 
compute moments from cumulants. Recall that for a (scalar) 
random variable Y, the cumulant-generating function is defined 
by 

*(()=logE[e'i'] 


from which the cumulants Kp are obtained via a power series 
expansion, i.e. 

p=i P- 

Recall that the pth cumulant is homogeneous of degree p 
in the sense that for any constant c G M, 


Kp{cY)^cPKp{Y). 

Furthermore, if Y’ and Y” are two independent random vari¬ 
ables, we have additivity Kp{Y' + Y") — Kp{Y') + Kp{Y"). The 
fact that independence can be naturally inserted via additivity 
is what makes using cumulants particularly attractive. 

Now, using the independence of initial state components 
i— and homogeneity of the cumulants, we have 

for arbitrary s G M'”, 

Kp({s,Ce^‘Xo)) = £((Ce-’')^i),'’K'p(Xo,,-). (18) 

/=1 


Therein, the left-hand side is known, and Kp{XQj), i— 1,... 
are the unknowns that we would like to solve for. We may 
rewrite 0 more compactly as 




Uce^fsr, 


\ {^0,n ) / 


(19) 


where x*p denotes the plh element-wise power of a vector x. 
In view of our novel theoretical framework, we can see that 
can be uniquely solved for the pth order cumulants of 
Xo, if and only if lJf>Qlm(Ce^^)^ is not contained in a proper 
algebraic variety of the form 


dix^ -\-d 2 X 2 H-= 0. (20) 


Thus, by placing an independence assumption on the initial 
state distribution, we shrink the class of algebraic varieties 
to be considered in the richness condition to those algebraic 
varieties defined by polynomials of degree p, which do not 
have cross terms. Hereby we conveniently say that a polyno¬ 
mial of degree p does not have a cross-term, if all monomials 
occurring in the polynomial are of the form 


Clearly this restriction makes it in principle easier for a 
system to be ensemble observable. From the tensor system 
viewpoint, lJf>oIm(Ce^^)^ is not contained in an algebraic 
variety of the form ( | 20 ] ) if and only if the intersection of the 
unobservable subspace of (A[p],C[^]) with the subspace 

[a G — dh entry of x^^^ is a cross-term} 

( 21 ) 

is trivial. This means that in view of testing observability of the 
tensor systems via a Hautus test, we do not need to consider 
every eigenvector of but only those that additionally lie 
in the set ( |^ . 

Although, as we pointed out, it is in principle easier to 
reconstruct an initial state distribution which is known to 
satisfy the independence assumption, it is in general also not 
the case that observability of (A,C) alone implies ensemble 
observability. We show this explicitly by constructing a con¬ 
crete system in the following example. 

Example 17: Given a three-dimensional system, it is not 
hard to verify that the dynamics of the second order moments 

(E[x}] E[xiV 2 ] E[xiX3] E[x 2 ] E[x2V3] E[x 3 ])^, 

is described by the system matrix 


/2oii 

2oi2 

2oi3 

0 

0 

0 \ 
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«23 
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«13 

0 

<331 

«32 

«11 +«33 
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«12 

ai 3 

0 

2o21 

0 

2o22 

2o23 

0 

0 

«31 

«21 

«32 

022 + «33 

«23 

V 0 

0 

2o3i 

0 

2o32 

2033/ 


which we denote Apj. Note that we dropped the normalization 
as it is not relevant for this analysis. The second order moment 
of the output E[y^] is related to the second order moments of 
the state by the output matrix 

:= (c} 2 ciC2 2ciC3 C 2 2c2C3 c}). 

To recap, for the ensemble observability analysis under the 
independence assumption we only need to consider eigenvec¬ 
tors where the second, third and fifth entries are zero. To 
construct a counterexample, we need to find an observable 
(A,C) failing the constrained Hautus test for the second order 
tensor system. In order to fail the constrained Hautus test, we 
need to be able to find a solution v to the following eigenvalue 
problem 


/ 2oiivi \ 


/vA 

«2lVl +O12V2 


0 

O31V1 -f O13V3 

= A 

0 

2o22V2 

V2 

O32V2+O23V3 


0 

\ 2033V3 / 


vv 


which satisfies additionally the second condition = 0 . 
First of all, it is seen from the eigenvalue problem that 

an — A 

needs to hold. Now if we choose vi = 1 ,V 2 = 1 and V 3 = — 1, 
the following equations need to hold as well 


«21+ai2=0, 031—ai3=0, 032—023=0. 










Moreover, if we choose 

C=(l 1 V2), 

then = 0. Based on this consideration, we come up with 

o\ 

0 x(t), y(0 = (l 1 V2)x{t), 

0 / 

which can be verified to be observable, but which fails the 
constrained Hautus test for by construction. 

For two concrete indistinguishable initial distributions, we 
can consider Pq = and Pq = with 

/^2 \ 





1 ' = 




- ( 7 ^ 




that we constructed via v in the unobservable subspace of 

With ( [TT] ) and the solution of the 
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(A[ 2 ],C[^]), cf. Example 
LTI system, 

= (cos(t) + sin(t) cos(t) — sin(t) \/2), 
we obtain for both initial distributions an output variance of 

Lastly, we note that Example shows that system E) 
is a system which becomes ensemble observable with the 
additional assumption of independence. This can be seen from 
the unobservable subspace which is spanned by that is not 
contained in ([21]). Another way to see this is through the fact 
that A = 0 must hold in Example [Ts] to fullfill the independence 
assumption, thus obstructing us from constructing a different 
but indistinguishable initial distribution. 


V. Conclusions 

We introduced the ensemble observability problem as the 
problem of reconstructing the distribution of initial states in a 
population of finite-dimensional systems from knowledge of 
the output distributions over time. The core of this problem 
turned out to be an inverse problem of tomographic type: 
the reconstruction of a density function from the knowledge 
of its projections which are given as integrals along a set 
of affine subspaces. This observation allowed us to apply 
the well-developed theory of mathematical tomography to the 
estimation problem. Underlying this observation is in fact 
a deeper connection between observability and tomography: 
both problems are about inferring information about internal 
variables of a system, or an object, respectively, from external 
measurements which are projections of these variables. 

The key contribution from the tomographic approach is 
a first sufficient condition for ensemble observability of a 
linear system under the assumption that the initial distribu¬ 
tions satisfy a certain regularity property. The tomographic 
approach also gives well-developed computational methods for 
actually solving the reconstruction problem in practical cases. 
However, special characteristics of the underlying dynamic 
problem, such as the limited range of observation “angles” or 
the resolution of measurement times, may need more attention 
in numerical approaches. 


While the tomographic approach gives a rather static picture 
of the reconstruction problem, we also pursued a more systems 
theoretic approach by considering the dynamics of the mo¬ 
ments. This led us to the study of tensor systems, as considered 
earlier by R. W. Brockett [l34|. Under the previously mentioned 
assumptions on the initial distributions, the initial distributions 
are fully determined by their moments. Accordingly, ensemble 
observability can be inferred from the observability of all 
tensor systems in this case. Eurthermore, we gave a tractable 
condition for the observability of all tensor systems in a 
specific case of systems with a scalar output. 

While individual systems are dynamic in this study, the 
ensembles are restricted to be rather static, in the sense that no 
individual systems are added to or removed from the ensemble. 
Many ensembles in practical applications are very dynamic 
however; for example, cells in a cell population may divide 
or die. Hence, it is also of interest to extend the concept of 
ensemble observability to dynamic populations. The measure 
theoretic approach from which we set off in this study looks 
also promising for this generalization. 
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